home *** CD-ROM | disk | FTP | other *** search
-
-
-
-
-
-
- An Overview of Version 5 of the Icon
- Programming Language
-
-
-
-
- 1. Introduction
-
- Icon is a high-level programming language with extensive
- facilities for processing strings and lists. Icon has several
- novel features, including expressions that may produce sequences
- of results, goal-directed evaluation that automatically searches
- for a successful result, and string scanning that allows opera-
- tions on strings to be formulated at a high conceptual level.
-
- Icon resembles SNOBOL4 [1] in its emphasis on high-level
- string processing and a design philosophy that allows ease of
- programming and short, concise programs. Like SNOBOL4, storage
- allocation and garbage collection are automatic in Icon, and
- there are few restrictions on the sizes of objects. Strings,
- lists, and other structures are created during program execution
- and their size does not need to be known when a program is writ-
- ten. Values are converted to expected types automatically; for
- example, numeral strings read in as input can be used in numeri-
- cal computations without explicit conversion. Whereas SNOBOL4
- has a pattern-matching facility that is separate from the rest of
- the language, string scanning is integrated with the rest of the
- language facilities in Icon. Unlike SNOBOL4, Icon has an
- expression-based syntax with reserved words; in appearance, Icon
- programs resemble those of several other conventional programming
- languages.
-
- Examples of the kinds of problems for which Icon is well
- suited are:
-
- + text analysis, editing, and formatting
-
- + document preparation
-
- + symbolic mathematics
-
- + text generation
-
- + parsing and translation
-
- + data laundry
-
- + graph manipulation
-
- Version 5 of Icon is implemented in C [2]. There are UNIX*
- implementations for the AT&T 3B20, PDP-11, Ridge 32, Sun
-
- *UNIX is a trademark of AT&T Bell Laboratories.
-
-
-
-
- - 1 -
-
-
-
-
-
-
-
-
- Workstation, and VAX-11. There also is a VMS implementation for
- the VAX-11 and an MS-DOS implementation for personal computers.
- Other implementations are in progress.
-
- An earlier version of Icon [3] is available on several large-
- scale computers, including the CRAY-1, DEC-10, IBM 360/370, PRIME
- 450/550/650, and CDC Cyber/6000.
-
- A brief description of some of the representative features of
- Icon is given in the following sections. This description is not
- rigorous and does not include many features of Icon. See [4] for
- a complete description of Version 5 and [5] for a description of
- recent additions to the language.
-
-
- 2. Strings
-
- Strings of characters may be arbitrarily long, limited only by
- the architecture of the computer on which Icon is implemented. A
- string may be specified literally by enclosing it in double quo-
- tation marks, as in
-
- greeting := "Hello world"
-
- which assigns an 11-character string to greeting, and
-
- address := ""
-
- which assigns the zero-length empty string to address. The
- number of characters in a string s, its size, is given by *s. For
- example, *greeting is 11 and *address is 0.
-
- Icon uses the ASCII character set, extended to 256 characters.
- There are escape conventions, similar to those of C, for
- representing characters that cannot be keyboarded.
-
- Strings also can be read in and written out, as in
-
- line := read()
-
- and
-
- write(line)
-
- Strings can be constructed by concatenation, as in
-
- element := "(" || read() || ")"
-
- If the concatenation of a number of strings is to be written out,
- the write function can be used with several arguments to avoid
- actual concatenation:
-
- write("(",read(),")")
-
-
-
-
- - 2 -
-
-
-
-
-
-
-
-
- Substrings can be formed by subscripting strings with range
- specifications that indicate, by position, the desired range of
- characters. For example,
-
- middle := line[10:20]
-
- assigns to middle the string of characters of line between posi-
- tions 10 and 20. Similarly,
-
- write(line[2])
-
- writes the second character of line. The value 0 is used to
- refer to the position after the last character of a string. Thus
-
- write(line[2:0])
-
- writes the substring of line from the second character to the
- end, thus omitting the first character.
-
- An assignment can be made to the substring of string-valued
- variable to change its value. For example,
-
- line[2] := "..."
-
- replaces the second character of line by three dots. Note that
- the size of line changes automatically.
-
- There are many functions for analyzing strings. An example is
-
- find(s1,s2)
-
- which produces the position in s2 at which s1 occurs as a sub-
- string. For example, if the value of greeting is as given ear-
- lier,
-
- find("or",greeting)
-
- produces the value 8. See Section 4.2 for the handling of situa-
- tions in which s1 does not occur in s2, or in which it occurs at
- several different positions.
-
-
- 3. Character Sets
-
- While strings are sequences of characters, csets are sets of
- characters in which membership rather than order is significant.
- Csets are represented literally using single enclosing quotation
- marks, as in
-
- vowels := 'aeiouAEIOU'
-
- Two useful built-in csets are &lcase and &ucase, which consist of
- the lowercase and uppercase letters, respectively. Set opera-
- tions are provided for csets. For example,
-
-
-
- - 3 -
-
-
-
-
-
-
-
-
- letters := &lcase ++ &ucase
-
- forms the cset union of the lowercase and uppercase letters and
- assigns the resulting cset to letters, while
-
- consonants := letters -- 'aeiouAEIOU'
-
- forms the cset difference of the letters and the vowels and
- assigns the resulting cset to consonants.
-
- Csets are useful in situations in which any one of a number of
- characters is significant. An example is the string analysis
- function
-
- upto(c,s)
-
- which produces the position s at which any character in c occurs.
- For example,
-
- upto(vowels,greeting)
-
- produces 2. Another string analysis function that uses csets is
-
- many(c,s)
-
- which produces the position in s after an initial substring con-
- sisting only of characters that occur in s. An example of the
- use of many is in locating words. Suppose, for example, that a
- word is defined to consist of a string of letters. The expres-
- sion
-
- write(line[1:many(letters,line)])
-
- writes a word at the beginning of line. Note the use of the posi-
- tion returned by a string analysis function to specify the end of
- a substring.
-
-
- 4. Expression Evaluation
-
- 4.1 Conditional Expressions
-
- In Icon there are conditional expressions that may succeed and
- produce a result, or may fail and not produce any result. An
- example is the comparison operation
-
- i > j
-
- which succeeds (and produces the value of j) provided that the
- value of i is greater than the value of j, but fails otherwise.
-
- The success or failure of conditional operations is used
- instead of Boolean values to drive control structures in Icon. An
- example is
-
-
-
- - 4 -
-
-
-
-
-
-
-
-
- if i > j then k := i else k := j
-
- which assigns the value of i to k if the value of i is greater
- than the value of j, but assigns the value of j to k otherwise.
-
- The usefulness of the concepts of success and failure is
- illustrated by find(s1,s2), which fails if s1 does not occur as a
- substring of s2. Thus
-
- if i := find("or",line) then write(i)
-
- writes the position at which or occurs in line, if it occurs, but
- does not write a value if it does not occur.
-
- Many expressions in Icon are conditional. An example is
- read(), which produces the next line from the input file, but
- fails when the end of the file is reached. The following expres-
- sion is typical of programming in Icon and illustrates the
- integration of conditional expressions and conventional control
- structures:
-
- while line := read() do
- write(line)
-
- This expression copies the input file to the output file.
-
- If an argument of a function fails, the function is not
- called, and the function call fails as well. This "inheritance"
- of failure allows the concise formulation of many programming
- tasks. Omitting the optional do clause in while-do, the previous
- expression can be rewritten as
-
- while write(read())
-
-
- 4.2 Generators
-
- In some situations, an expression may be capable of producing
- more than one result. Consider
-
- sentence := "Store it in the neighboring harbor"
- find("or",sentence)
-
- Here or occurs in sentence at positions 3, 23, and 33. Most pro-
- gramming languages treat this situation by selecting one of the
- positions, such as the first, as the result of the expression. In
- Icon, such an expression is a generator and is capable of produc-
- ing all three positions.
-
- The results that a generator produces depend on context. In a
- situation where only one result is needed, the first is produced,
- as in
-
-
-
-
-
- - 5 -
-
-
-
-
-
-
-
-
- i := find("or",sentence)
-
- which assigns the value 3 to i.
-
- If the result produced by a generator does not lead to the
- success of an enclosing expression, however, the generator is
- resumed to produce another value. An example is
-
- if (i := find("or",sentence)) > 5 then write(i)
-
- Here the first result produced by the generator, 3, is assigned
- to i, but this value is not greater than 5 and the comparison
- operation fails. At this point, the generator is resumed and pro-
- duces the second position, 23, which is greater than 5. The com-
- parison operation then succeeds and the value 23 is written.
- Because of the inheritance of failure and the fact that com-
- parison operations return the value of their right argument, this
- expression can be written in the following more compact form:
-
- write(5 < find("or",sentence))
-
-
- Goal-directed evaluation is inherent in the expression evalua-
- tion mechanism of Icon and can be used in arbitrarily complicated
- situations. For example,
-
- find("or",sentence1) = find("and",sentence2)
-
- succeeds if or occurs in sentence1 at the same position as and
- occurs in sentence2.
-
- A generator can be resumed repeatedly to produce all its
- results by using the every-do control structure. An example is
-
- every i := find("or",sentence)
- do write(i)
-
- which writes all the positions at which or occurs in sentence.
- For the example above, these are 3, 23, and 33.
-
- Generation is inherited like failure, and this expression can
- be written more concisely by omitting the optional do clause:
-
- every write(find("or",sentence))
-
-
- There are several built-in generators in Icon. One of the most
- frequently used of these is
-
- i to j
-
- which generates the integers from i to j. This generator can be
- combined with every-do to formulate the traditional for-style
- control structure:
-
-
-
- - 6 -
-
-
-
-
-
-
-
-
- every k := i to j do
- f(k)
-
- Note that this expression can be written more compactly as
-
- every f(i to j)
-
-
- There are a number of other control structures related to gen-
- eration. One is alternation,
-
- expr1 | expr2
-
- which generates the results of expr1 followed by the results of
- expr2. Thus
-
- every write(find("or",sentence1) | find("or",sentence2))
-
- writes the positions of or in sentence1 followed by the positions
- of or in sentence2. Again, this sentence can be written more com-
- pactly by using alternation in the second argument of find:
-
- every write(find("or",sentence1 | sentence2))
-
-
- Another use of alternation is illustrated by
-
- (i | j | k) = (0 | 1)
-
- which succeeds if any of i, j, or k has the value 0 or 1.
-
-
- 5. String Scanning
-
- The string analysis and synthesis operations described in Sec-
- tions 2 and 3 work best for relatively simple operations on
- strings. For complicated operations, the bookkeeping involved in
- keeping track of positions in strings becomes burdensome and
- error prone. In such cases, Icon has a string scanning facility
- that is analogous in many respects to pattern matching in SNO-
- BOL4. In string scanning, positions are managed automatically and
- attention is focused on a current position in a string as it is
- examined by a sequence of operations.
-
- The string scanning operation has the form
-
- s ? expr
-
- where s is the subject string to be examined and expr is an
- expression that performs the examination. A position in the sub-
- ject, which starts at 1, is the focus of examination.
-
- Matching functions change this position. One matching func-
- tion, move(i), moves the position by i and produces the substring
-
-
-
- - 7 -
-
-
-
-
-
-
-
-
- of the subject between the previous and new positions. If the
- position cannot be moved by the specified amount (because the
- subject is not long enough), move(i) fails. A simple example is
-
- line ? while write(move(2))
-
- which writes successive two-character substrings of line, stop-
- ping when there are no more characters.
-
- Another matching function is tab(i), which sets the position
- in the subject to i and also returns the substring of the subject
- between the previous and new positions. For example,
-
- line ? if tab(10) then write(tab(0))
-
- first sets the position in the subject to 10 and then to the end
- of the subject, writing line[10:0]. Note that no value is writ-
- ten if the subject is not long enough.
-
- String analysis functions such as find can be used in string
- scanning. In this context, the string that they operate on is not
- specified and is taken to be the subject. For example,
-
- line ? while write(tab(find("or")))
- do move(2)
-
- writes all the substrings of line prior to occurrences of or.
- Note that find produces a position, which is then used by tab to
- change the position and produce the desired substring. The
- move(2) skips the or that is found.
-
- Another example of the use of string analysis functions in
- scanning is
-
- line ? while tab(upto(letters)) do
- write(tab(many(letters)))
-
- which writes all the words in line.
-
- As illustrated in the examples above, any expression may occur
- in the scanning expression. Unlike SNOBOL4, in which the opera-
- tions that are allowed in pattern matching are limited and
- idiosyncratic, string scanning is completely integrated with the
- rest of the operation repertoire of Icon.
-
-
- 6. Structures
-
- 6.1 Lists
-
- While strings are sequences of characters, lists in Icon are
- sequences of values of arbitrary types. Lists are created by
- enclosing the lists of values in brackets. An example is
-
-
-
-
- - 8 -
-
-
-
-
-
-
-
-
- car1 := ["buick","skylark",1978,2450]
-
- in which the list car1 has four values, two of which are strings
- and two of which are integers. Note that the values in a list
- need not all be of the same type. In fact, any kind of value can
- occur in a list -- even another list, as in
-
- inventory := [car1,car2,car3,car4]
-
-
- Lists also can be created by
-
- a := list(i,x)
-
- which creates a list of i values, each of which has the value x.
-
- The values in a list can be referenced by position much like
- the characters in a string. Thus
-
- car1[4] := 2400
-
- changes the last value in car1 to 2400. A reference that is out
- of the range of the list fails. For example,
-
- write(car1[5])
-
- fails.
-
- The values in a list a are generated by !a. Thus
-
- every write(!a)
-
- writes all the values in a.
-
- Lists can be manipulated like stacks and queues. The function
- push(a,x) adds the value of x to the left end of the list a,
- automatically increasing the size of a by one. Similarly, pop(a)
- removes the leftmost value from a, automatically decreasing the
- size of a by one, and produces the removed value.
-
- A list value in Icon is a pointer (reference) to a structure.
- Assignment of a structure in Icon does not copy the structure
- itself but only the pointer to it. Thus the result of
-
- demo := car1
-
- causes demo and car1 to reference the same list. Graphs with
- loops can be constructed in this way. For example,
-
- node1 := ["a"]
- node2 := [node1,"b"]
- push(node1,node2)
-
-
-
-
-
- - 9 -
-
-
-
-
-
-
-
-
- constructs a structure that can be pictured as follows:
-
-
- node1 .->a--.
- | |
- | |
- node2 '--b<-'
-
-
-
- 6.2 Tables
-
- Icon has a table data type similar to that of SNOBOL4. Tables
- essentially are sets of pairs of values, an entry value and a
- corresponding assigned value. The entry and assigned values may
- be of any type, and the assigned value for any entry value can be
- looked up automatically. Thus tables provide a form of associa-
- tive access in contrast with the positional access to values in
- lists.
-
- A table is created by an expression such as
-
- symbols := table(x)
-
- which assigns to symbols a table with the default assigned value
- x. Subsequently, symbols can be referenced by any entry value,
- such as
-
- symbols["there"] := 1
-
- which assigns the value 1 to the thereth entry in symbols.
-
- Tables grow automatically as new entry values are added. For
- example, the following program segment produces a table contain-
- ing a count of the words that appear in the input file:
-
- words := table(0)
- while line := read() do
- line ? while tab(upto(letters)) do
- words[tab(many(letters))] +:= 1
-
- Here the default assigned value for each word is 0, as given in
- table(0), and +:= is an augmented assignment operation that
- increments the assigned values by one. There are augmented
- assignment operations for all binary operators.
-
- Tables can be converted to lists, so that their entry and
- assigned values can be accessed by position. This is done by
- sort(t), which produces a list of two-element lists from t, where
- each two-element list consists of an entry value and its
- corresponding assigned value. For example,
-
-
-
-
-
-
- - 10 -
-
-
-
-
-
-
-
-
- wordlist := sort(words)
- every pair := !wordlist do
- write(pair[1]," : ",pair[2])
-
- writes the words and their counts from words.
-
-
- 7. Procedures
-
- An Icon program consists of a sequence of procedure declara-
- tions. An example of a procedure declaration is
-
- procedure max(i,j)
- if i > j then return i else return j
- end
-
- where the name of the procedure is max and its formal parameters
- are i and j. The return expressions return the value of i or j,
- whichever is larger.
-
- Procedures are called like built-in functions. Thus
-
- k := max(*s1,*s2)
-
- assigns to k the size of the longer of the strings s1 and s2.
-
- A procedure also may suspend instead of returning. In this
- case, a result is produced as in the case of a return, but the
- procedure can be resumed to produce other results. An example is
- the following procedure that generates the words in the input
- file.
-
- procedure genword()
- local line, letters, words
- letters := &lcase ++ &ucase
- while line := read() do
- line ? while tab(upto(letters)) do {
- word := tab(many(letters))
- suspend word
- }
- end
-
- The braces enclose a compound expression.
-
- Such a generator is used in the same way that a built-in gen-
- erator is used. For example
-
- every word := genword() do
- if find("or",word) then write(word)
-
- writes only those words that contain the substring or.
-
-
-
-
-
-
- - 11 -
-
-
-
-
-
-
-
-
- 8. An Example
-
- The following program sorts graphs topologically.
-
- procedure main()
- local sorted, nodes, arcs, roots
- while nodes := read() do { # get next node list
- arcs := read() # get arc list
- sorted := "" # sorted nodes
- # get nodes without predecessors
- while *(roots := nodes -- snodes(arcs)) > 0 do {
- sorted ||:= roots # add to sorted nodes
- nodes --:= roots # delete these nodes
- arcs := delarcs(arcs,roots)# delete their arcs
- }
- if *arcs = 0 then write(sorted)# successfully sorted
- else write("graph has cycle")# cycle if node remains
- }
- end
-
-
- procedure snodes(arcs)
- local nodes
- nodes := ""
- arcs ? while move(1) do { # predecessor
- move(2) # skip "->"
- nodes ||:= move(1) # successor
- move(1) # skip ";"
- }
- return nodes
- end
-
-
- procedure delarcs(arcs,roots)
- local newarcs, node
- newarcs := ""
- arcs ? while node := move(1) do {# get predecessor node
- if many(roots,node) then move(4)# delete arc from root node
- else newarcs ||:= node || move(4)# else keep arc
- }
- return newarcs
- end
-
- Graph nodes are represented by single characters with a list of
- the nodes on one input line followed by a list of arcs. For exam-
- ple, the graph
-
-
-
-
-
-
-
-
-
-
-
- - 12 -
-
-
-
-
-
-
-
-
- .---------------.
- | |
- | |
- a------>b------>c
- | | |
- | | |
- | v |
- d------>e-------'
-
-
- is given as
-
- abcde
- a->b;a->c;b->c;b->e;d->a;d->e;e->c;
-
- for which the output is
-
- dabec
-
-
- The nodes are represented by csets and automatic type conver-
- sion is used to convert strings to csets and vice versa. Note
- the use of augmented assignment operations for concatenation and
- in the computation of cset differences.
-
- Acknowledgement
-
- Icon was designed by the the author in collaboration with Dave
- Hanson, Tim Korb, Cary Coutant, and Steve Wampler. The current
- implementation is largely the work of Cary Coutant and Steve
- Wampler with recent contributions by Bill Mitchell. Dave Hanson
- and Bill Mitchell also made several helpful suggestions on the
- presentation of material in this paper.
-
- References
-
-
- 1. Griswold, Ralph E., Poage, James F., and Polonsky, Ivan P.
- The SNOBOL4 Programming Language, second edition. Prentice-
- Hall, Inc., Englewood Cliffs, New Jersey. 1971.
-
- 2. Kernighan, Brian W. and Ritchie, Dennis M. The C Programming
- Language. Prentice-Hall, Inc., Englewood Cliffs, New Jersey.
- 1978.
-
- 3. Griswold, Ralph E. Differences Between Versions 2 and 5 of
- Icon, Technical Report TR 83-5, Department of Computer Sci-
- ence, The University of Arizona. 1983.
-
- 4. Griswold, Ralph E. and Griswold, Madge T. The Icon Programming
- Language. Prentice-Hall, Inc., Englewood Cliffs, New Jersey.
- 1983.
-
-
-
-
-
- - 13 -
-
-
-
-
-
-
-
-
- 5. Griswold, Ralph E. Extensions to Version 5 of the Icon Pro-
- gramming Language, Technical report, Department of Computer
- Science, The University of Arizona. 1985.
-
-
-
-
- Ralph E. Griswold
- Department of Computer Science
- The University of Arizona
-
- September 30, 1985
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- - 14 -
-
-